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The authors provide an authoritative lecture guide 
of Theory of Probability, where they clearly state 
that the more useful material today is that con- 
tained in Chapters 3 and 5, which respectively deal 
with estimation, and hypothesis testing. We argue 
that, from a contemporary viewpoint, the impact of 
Jeffreys proposals on those two problems is rather 
different, and we describe what we perceive to be the 
state of the question nowadays, suggesting that Jef- 
freys's dramatically different treatment is not nec- 
essary, and that a joint objective approach to those 
two problems is indeed possible. 

1. INTRODUCTION 

As the authors point out, Theory of Probability 
is an indispensable, if often difficult to navigate, 
Bayesian foundational text. Their authoritative lec- 
ture guide is therefore very welcome. As should be 
clear from their review, the main useful material to- 
day is contained in Chapters 3 and 5 which, respec- 
tively, deal with estimation, in the sense of deriving 
an objective posterior distribution for the quantity 
of interest, and hypothesis testing, presented as a 
derivation of an objective posterior probability for 
the hypothesis under consideration. I believe that, 
from a contemporary viewpoint, the impact of Jef- 
freys proposals on those two problems is rather dif- 
ferent, as I now briefly try to describe. 

2. ESTIMATION 

One-parameter Jeffreys estimation prior ( Jeffreys 
rule). Following his own pioneering work (Jeffreys, 
1946), the book introduces in Section 3.10 what it is 
now considered the main meaning of the confusing 
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denomination "Jeffreys prior." Thus, to obtain an 
objective posterior density for the parameter a of 
a probability model f(x\a), he proposes the formal 
use in Bayes theorem of the (often improper) prior 
ir(a) oc (/(a)) 1 / 2 , where 1(a) is Fisher information 
function. As the authors point out, Jeffreys 's mo- 
tivation is rather obscure: he describes 1(a) as a 
second order approximation to two functional dis- 
tances, and notes that |/(a)| 1//2 happens to be in- 
variant under one-to-one transformations. No trace 
of its more intuitive interpretation in terms of the 
prior which assigns equal probabilities to equally 
distinguishable subregions of the parameter space 
(Lindley, 1961). Also, even in its third (1961) edi- 
tion, the book only gives a cursory reference to the 
independent, essentially simultaneous, derivation of 
the same "rule" produced by Perks (1947) in a much 
underrated paper. That said, Jeffreys (or Jeffreys- 
Perks) rule is today the objective prior of choice 
for regular problems with one continuous parameter, 
and has been justified in this simple case from many 
different viewpoints, including coverage properties 
(Welch and Peers, 1963), minimum bias (Hartigan, 
1965), data translation (Box and Tiao, 1973) and 
information-theoretical arguments (Bernardo, 1979; 
Berger, Bernardo and Sun, 2009). In one-parameter 
problems, Jeffreys left without solution non-regular 
models (e.g., those where the sampling space de- 
pends on the parameter) and models with a discrete 
parameter (although he suggested a very interesting 
hierarchical argument to deal with the particular ex- 
ample of the hyper geometric distribution). 

Many-parameter Jeffreys estimation prior (mul- 
tiparameter Jeffreys rule). The arguments used to 
propose his rule for one continuous parameter regu- 
lar models extend to the corresponding multiparam- 
eter case, leading to ir(a) oc [/(a)) 1 / 2 , where 1(a) 
is now Fisher information matrix. As the authors 
point out in their review, Jeffreys immediately re- 
alized, however, that his multivariate rule does not 
generally produce sensible answers and suggested ad 
hoc alternatives in virtually all the multiparameter 
examples he analyzed, leading to a plethora of "Jef- 
freys priors" in the sense that they were proposed 
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by him, although they do not follow from his gen- 
eral rule. Moreover, as all Bayesians in his time, 
Jeffreys was working under the assumption that a 
unique objective prior would be appropriate for all 
inference problems within a multiparameter model. 
Stein (1959) paradox already suggested that this 
could not possibly be true, but it was the discov- 
ery of the marginalization paradoxes (Dawid, Stone 
and Zidek, 1973) what definitely established this as 
a fact, while the reference priors (Bernardo, 1979) 
provided the first solution to the problem thus cre- 
ated. 

Proper posteriors. Scholars have been often sur- 
prised at the fact that, when applicable, Jeffreys rule 
priors (ever in their multiparameter version) typi- 
cally produce proper posteriors for all data sets, a 
condition one should certainly require for any pro- 
posal of an objective prior to be permissible. I won- 
der if the authors have an explanation for this re- 
markable fact (shared by reference priors). Curi- 
ously, as noted by the authors, Jeffreys uses in his 
analysis of the Poisson model the prior ir(a) oc 1/a 
on an scale invariant argument, only to mention 
later that it is the Jeffreys-rule prior, n(a) oc 1/y/a, 
the one leading to a proper posterior for all possible 
data sets. 

3. HYPOTHESIS TESTING 

Jeffreys hypothesis testing priors. In Chapter 5, 
Jeffreys focuses on precise hypothesis testing and, 
as the present review indicates, does not produce 
a solution with the level of acceptance and gener- 
ality of his one-parameter rule for estimation. Jef- 
freys intends to obtain a posterior probability for a 
precise null hypothesis and, to do this, he is forced 
to use a mixed prior which puts a lump of prob- 
ability p = Pv(Hq) on the null, say Hq = {6 = 9o}, 
and distributes the rest with a proper prior p{9) 
(he mostly chooses p = 1/2). This has a very upset- 
ting consequence, usually known as Lindley's para- 
dox (Lindley, 1957): for any fixed prior probability p 
independent of the sample sixe n, the procedure will 
wrongly accept Hq whenever the likelihood is con- 
centrated around a true parameter value which lies 
0{n~ 1 / 2 ) from Hq. I find it difficult to accept a pro- 
cedure which is known to produce the wrong answer 
under specific, but not controllable, circumstances; 
see Robert (1993) for a relatively recent review of 
this fascinating issue. Besides this, I believe, serious 
problem, Jeffreys suggestion of a Cauchy density for 



the required proper prior is rather ad hoc and does 
not generalize to more complicated problems. There 
have been many attempts to define priors intended 
to obtain objective posterior probabilities for precise 
nulls. To the best of my knowledge, none of those has 
emerged as a clearly acceptable general solution. 

Hypothesis testing with conventional reference pri- 
ors. To test whether or not a precise hypothesis Hq 
is compatible with observed data, it is not necessary 
to try to obtain a posterior probability for Hq, and 
hence it it not necessary to use a totally different 
type of objective prior than that used for estima- 
tion. As forcefully argued by Jaynes (1980), all that 
is required is to obtain the posterior distribution of 
a quantity which measures the discrepancy between 
the true model and the null model. A very attractive 
candidate is the intrinsic discrepancy. The intrinsic 
discrepancy between two probability distributions 
pi and pi for x is defined as 

<HPl,P2} = min[K{pi|j> 2 }, n{p 2 \pi}}, 

where K{pj\pi} = J A ,.p i (x)log[p i (x)/p i (x)] dx, the 
Kullback-Leibler (KL) divergence of pj fromp,. This 
inherits all the very nice properties of the KL di- 
vergence (non-negative, invariant, additive) but is 
also symmetric and it is defined even if the sup- 
ports of the two distributions are strictly nested. 
For instance, in the canonical example of testing 
whether a random sample x = {x\, . . . ,x n } from a 
normal N(x\fi,a) is or is not compatible with the 
mean value [Iq, one obtains 

5{fi , (fi, a)} = 5{N(x\fj, ,a),N(x\iJ,, a)} 

= 1 / fi-fi \ 2 

2WvW ' 
If a is known, the objective posterior distribution of 
5{hq, (/x, a)} with the usual objective prior 7r(/u) = 1 
gives all required information about whether or not 
the null Hq = {/i = hq} should be accepted, includ- 
ing the size of the plausible departures. If a formal 
decision is required, 5{/j,q, (//, <t)} may be used as a 
loss function (it is an intrinsic loss in the sense of 
Robert, 1996). In this case, one simply computes its 
expected value, which is the intrinsic test statistic 

dwm = o 1 + z ). z = i /- > 

2 o-/yJn 

and rejects the null if this is too large (say larger 
than log [100] since this would imply that the data 
are expected to be over 100 times more likely un- 
der the true model than under the null model). See 
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Bernardo and Rueda (2002) and Bernardo (2005) 
for the general definition when there are nuisance 
parameters, and for many specific examples. One 
could certainly use other continuous loss functions 
but the point is, Bayesian testing of precise nulls do 
not necessarily require the use of mixed priors as 
those suggested by Jeffreys for this problems, and 
this has the nontrivial merit of being able to use for 
both estimation and hypothesis testing problems a 
single, unified theory for the derivation of objective 
"reference" priors. 

ACKNOWLEDGMENT 

Work supported in part by MEC Grant MTM2006- 
07801, Spain. 

REFERENCES 

Berger, J., Bernardo, J. M. and Sun, D. (2009). The 
formal definition of reference priors. Ann. Statist. 37 905- 
938. MR2502655 

Bernardo, J. M. (2005). Reference analysis. In Handbook 
of Statistics (D. K. Dey and C. R. Rao, eds.) 25 17-90. 
Elsevier, Amsterdam. 

Bernardo, J. M. (1979). Reference posterior distributions 
for Bayesian inference. J. Roy. Statist. Soc. Ser. B 41 113- 
147 (with discussion). Reprinted in (1995) Bayesian Infer- 
ence (N. G. Poison and G. C. Tiao, eds.). Brookfield, VT. 
Edvard Elgar 229-263. 

Bernardo, J. M. and Rueda, R. (2002). Bayesian hypothe- 
sis testing: A reference approach. Internat. Statist. Rev. 70 
351-372. 



Box, G. E. P. and TlAO, G. C. (1973). Bayesian Infer- 
ence in Statistical Analysis. Addison-Wesley, Reading, MA. 
MR0418321 

Dawid, A. P., Stone, M. and Zidek, J. V. (1973). Marginal- 
ization paradoxes in Bayesian and structural inference 
(with discussion). J. Roy. Statist. Soc. Ser. B 35 189-233. 

Jaynes, E. T. (1980). Comments on hypothesis testing. In 
Bayesian Statistics (J. M. Bernardo, M. H. DeGroot, D. V. 
Lindley and A. F. M. Smith, eds.) 618-629. Valencia Univ. 
Press, Valencia. MR0638871 

Jeffreys, H. (1946). An invariant form for the prior proba- 
bility in estimation problems. Proc. Roy. Soc. London Ser. 
A 186 453-461. MR0017504 

Hartigan, J. A. (1965). The asymptotically unbiased 
prior distribution. Ann. Math. Statist. 36 1137-1152. 
MR0176539 

Lindley, D. V. (1957). A statistical paradox. Biometrika 44 
187-192. 

Lindley, D. V. (1961). The use of prior probability distribu- 
tions in statistical inference and decision. In Proc. Fourth 
Berkeley Symp. Math. Statist. Probab. (J. Neyman and 
E. L. Scott, eds.) 4 453-468. Univ. California Press, Berke- 
ley. MR0156437 

Perks, W. (1947). Some observations on inverse probability, 
including a new indifference rule (with discussion) . J. Inst. 
Actuaries 73 285-334. MR0025103 

Robert, C. P. (1993). A note on Jeffreys-Lindley paradox. 
Statist. Smica 3 603-608. MR1243404 

Robert, C. P. (1996). Intrinsic loss functions. Theory and 
Decision 40 192-214. 

Stein, C. (1959). An example of wide discrepancy between 
fiducial and confidence intervals. Ann. Math. Statist. 30 
877-880. MR0125680 

Welch, B. L. and Peers, H. W. (1963). On formulae for 
confidence points based on intervals of weighted likeli- 
hoods. J. Roy. Statist. Soc. Ser. B 25 318-329. 



